AITopics | data pre-processing

Collaborating Authors

data pre-processing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

cfa8440d500a6a6867157dfd4eaff66e-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 02:22:36 GMT

artificial intelligence, layer layer index index, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments

Li, Changye, Xu, Weizhe, Cohen, Trevor, Michalowski, Martin, Pakhomov, Serguei

arXiv.org Artificial IntelligenceMar-14-2023

The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (\textbf{T}oolkit for \textbf{R}eproducible \textbf{E}xecution of \textbf{S}peech \textbf{T}ext and \textbf{L}anguage \textbf{E}xperiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.07322

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.29)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Wisconsin (0.05)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Dementia (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Data pre-processing for Machine Learning in Python

#artificialintelligenceApr-25-2022, 14:54:55 GMT

Data Preprocessing refers to the steps applied to make data more suitable for data mining. In this course, we are going to focus on pre-processing techniques for machine learning. Pre-processing is the set of manipulations that transform a raw dataset to make it used by a machine learning model. It is necessary for making our data suitable for some machine learning models, to reduce the dimensionality, to better identify the relevant data, and to increase model performance. It's the most important part of a machine learning pipeline and it's strongly able to affect the success of a project.

data pre-processing, machine learning, pre-processing technique, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.44)

Add feedback

Top 10 Data Preparation Techniques to Use in ML Projects

#artificialintelligenceApr-6-2022, 12:54:44 GMT

Data preparation is the process of cleaning and transforming raw data prior to processing and analysis so that data scientists and analysts can run it through machine learning algorithms to uncover insights or make predictions. It may be one of the most difficult steps in any ML project.ML depends heavily on data. It's the most crucial aspect that makes algorithm training possible and explains why machine learning became so popular in recent years. Here are some important techniques for ML projects. Firstly acquire the relevant dataset, to build and develop machine learning models.

data pre-processing, data preparation technique, dataset, (8 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Pain Points Of Scaling Data Science - Liwaiwai

#artificialintelligenceAug-10-2021, 03:30:05 GMT

While building a machine learning model, data scaling in machine learning is the most significant element through data pre-processing. Scaling may recognize the difference between a model of poor machine learning and a stronger one. Machine learning algorithm only recognizes numerical if there is a significant difference in the dimension, say few varying in tens or hundreds or often in thousands, among these predominant numbers when the data is used before scaling, it attempts to play a more significant role while preparing the ML model. For machine learning algorithms, data scaling is important in calculating intervals between data and evaluating the variables with their meaning compared to an arbitrary lower-value variable. Another explanation why data scaling science is used is that few algorithms perform better with data scaling than without them, such as Neural network nonlinear regression.

library, pain point, scaling data science, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Master Data Science with Python in 10 Hours

#artificialintelligenceMay-26-2021, 10:52:37 GMT

With this course, you will learn the basics of Python and its most popular libraries for Data Science such as Numpy, Pandas, Matplotlib, Seaborn. You will learn all the important tools and knowledge for Data Science with more than 60 lectures, practice all your new skills with 4 big exercises sections, including more than 85 exercise questions and you will do all of this using one of the most popular programming languages: PYTHON! Data pre-processing is a very important stage of the work flow of Machine Learning. With this course, you will learn how to import, check, clean data in terms of data pre-processing for Machine Learning, also visualize data and communicate your results using impressive plots. This course will help you jump start your career or take your first big step into the world of Data Science and Machine Learning which are very popular fields with many attractive job opportunities!

data science, master data science, python, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

An Explainable Probabilistic Classifier for Categorical Data Inspired to Quantum Physics

Guidotti, Emanuele, Ferrara, Alfio

arXiv.org Artificial IntelligenceMay-26-2021

This paper presents Sparse Tensor Classifier (STC), a supervised classification algorithm for categorical data inspired by the notion of superposition of states in quantum physics. By regarding an observation as a superposition of features, we introduce the concept of wave-particle duality in machine learning and propose a generalized framework that unifies the classical and the quantum probability. We show that STC possesses a wide range of desirable properties not available in most other machine learning methods but it is at the same time exceptionally easy to comprehend and use. Empirical evaluation of STC on structured data and text classification demonstrates that our methodology achieves state-of-theart performances compared to both standard classifiers and deep learning, at the additional benefit of requiring minimal data pre-processing and hyper-parameter tuning. Moreover, STC provides a native explanation of its predictions both for single instances and for each target label globally. All the code is released at https://sparsetensorclassifier.org

algorithm, explanation, probability, (17 more...)

arXiv.org Artificial Intelligence

2105.13988

Country:

North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(7 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

The EpiBench Platform to Propel AI/ML-based Epidemic Forecasting: A Prototype Demonstration Reaching Human Expert-level Performance

Srivastava, Ajitesh, Xu, Tianjian, Prasanna, Viktor K.

arXiv.org Artificial IntelligenceFeb-4-2021

During the COVID-19 pandemic, a significant effort has gone into developing ML-driven epidemic forecasting techniques. However, benchmarks do not exist to claim if a new AI/ML technique is better than the existing ones. The "covid-forecast-hub" is a collection of more than 30 teams, including us, that submit their forecasts weekly to the CDC. It is not possible to declare whether one method is better than the other using those forecasts because each team's submission may correspond to different techniques over the period and involve human interventions as the teams are continuously changing/tuning their approach. Such forecasts may be considered "human-expert" forecasts and do not qualify as AI/ML approaches, although they can be used as an indicator of human expert performance. We are interested in supporting AI/ML research in epidemic forecasting which can lead to scalable forecasting without human intervention. Which modeling technique, learning strategy, and data pre-processing technique work well for epidemic forecasting is still an open problem. To help advance the state-of-the-art AI/ML applied to epidemiology, a benchmark with a collection of performance points is needed and the current "state-of-the-art" techniques need to be identified. We propose EpiBench a platform consisting of community-driven benchmarks for AI/ML applied to epidemic forecasting to standardize the challenge with a uniform evaluation protocol. In this paper, we introduce a prototype of EpiBench which is currently running and accepting submissions for the task of forecasting COVID-19 cases and deaths in the US states and We demonstrate that we can utilize the prototype to develop an ensemble relying on fully automated epidemic forecasts (no human intervention) that reaches human-expert level ensemble currently being used by the CDC.

epibench, forecast, forecasting, (15 more...)

arXiv.org Artificial Intelligence

2102.02842

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Poland (0.04)
Europe > Germany (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Public Health (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Natural Language Processing Made Simpler with 4 Basic Regular Expression Operators!

#artificialintelligenceOct-19-2020, 13:10:21 GMT

Let us analyze how to use this module now in more detail with the following text sample and how exactly the re module can be used to perform the various operations required for appropriate processing and parsing of the text data. I just made up a random text sample with some random irregular sentences. You can use the same sentence as me or make up your own random sentence and follow along. Using the four above functions almost any natural language task and data pre-processing of text data can be done. So, without further ado, let us start analyzing each of these functions and how they can be utilized.

artificial intelligence, natural language, opération, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

End-to-End Machine Learning in JavaScript Using Danfo.js and TensorFlow.js (part 3)

#artificialintelligenceSep-18-2020, 07:05:59 GMT

This is the third and final part of a three-part series. I suggest you read parts 1 and 2 first for better understanding. In the first part of the series, we got introduced to danfo.js, a new JavaScript package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. The second part dealt extensively with data pre-processing for model building, training, and evaluation with TensorFlow.js and danfo.js in an Observable notebook. In Pythonic data science end-to-end projects, notebooks are converted into scripts during deployment or package building.

javascript, machine learning, programming language, (9 more...)

#artificialintelligence

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback